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Abstract 

Most programming languages use monitors with explicit signals for 
synchronization in shared-memory programs. Requiring program- 
mers to signal threads explicitly results in many concurrency bugs 
due to missed notifications, or notifications on wrong condition 
variables. In this paper, we describe an implementation of an au- 
tomatic signaling monitor in Java called AutoSynch that eliminates 
such concurrency bugs by removing the burden of signaling from 
the programmer. We show that the belief that automatic signaling 
monitors are prohibitively expensive is wrong. For most problems, 
programs based on AutoSynch are almost as fast as those based on 
explicit signaling. For some, AutoSynch is even faster than explicit 
signaling because it never uses signalAll, whereas the programmers 
end up using signalAll with the explicit signal mechanism. 

AutoSynch achieves efficiency in synchronization based on three 
novel ideas. We introduce an operation called globalization that 
enables the predicate evaluation in every thread, thereby reducing 
context switches during the execution of the program. Secondly, 
AutoSynch avoids signalAll by using a property called relay invari- 
ance that guarantees that whenever possible there is always at least 
one thread whose condition is true which has been signaled. Finally, 
AutoSynch uses a technique called predicate tagging to efficiently 
determine a thread that should be signaled. To evaluate the effi- 
ciency of AutoSynch, we have implemented many different well- 
known synchronization problems such as the producers/consumers 
problem, the readers/writers problems, and the dining philosophers 
problem. The results show that AutoSynch is almost as efficient as 
the explicit- signal monitor and even more efficient for some cases. 

Categories and Subject Descriptors D.1.3 [Concurrent Pro- 
gramming]: Parallel programming; D.3.3 [Language Constructs 
and Features]: Concurrent programming structures; classes and 
objects; control structures 

General Terms Algorithms, Languages, Performance 

Keywords automatic signal, explicit signal, implicit signal, mon- 
itor, concurrency, parallel 
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1. Introduction 

Multicore hardware is now ubiquitous. Programming these multi- 
core processors is still a challenging task due to bugs resulting from 
concurrency and synchronization. Although there is widespread 
acknowledgement of difficulties in programming these systems, 
it is surprising that by and large the most prevalent methods of 
dealing with synchronization are based on ideas that were devel- 
oped in early 70's 0, S [Till . For example, the most widely used 
threads package in C++ 1131 . pthr eads and the most widely 
used threads package in Java ifTT Jl . iava.util.concurrent JUg l • are 
based on the notion of monitors B IT^tl for semaphores lHjllal)- In 
this paper, we propose a new method called AutoSynch based on 
automatic signaling monitor that allows gains in productivity of the 
programmer as well as gain in performance of the system. 

Both pthreads and Java require programmers to explicitly signal 
threads that may be waiting on certain condition. The programmer 
has to explicitly declare condition variables and then signal one 
or all of the threads when the associated condition becomes true. 
Using the wrong waiting notification (signal versus signalAll or 
notify versus notify All) is a frequent source of bugs in Java multi- 
threaded programs. In our proposed approach, AutoSynch, there is 
no notion of condition variables and it is the responsibility of the 
system to signal appropriate threads. This feature significantly re- 
duces the program size and complexity. In addition, it allows us to 
completely eliminate signaling more than one thread resulting in 
reduced context switches and better performance. The idea of auto- 
matic signaling was initially explored by Hoare in [ 14], but rejected 
in favor of condition variables due to efficiency considerations. The 
belief that automatic signaling is extremely inefficient compared to 
explicit signaling is widely held since then and all prevalent con- 
current languages based on monitors use explicit signaling. For ex- 
ample, Buhr, Fortier, and Coffin claim that automatic monitors are 
10 to 50 times slower than explicit signals [4]. The reason for this 
drastic slowdown in previous implementations of automatic moni- 
tor is that they evaluate all possible conditions on which threads are 
waiting whenever the monitor becomes available. We show in this 
paper that the widely held belief is wrong. 

With careful analysis of the conditions on which the threads 
are waiting and evaluating as few conditions as possible, automatic 
signaling can be as efficient as explicit signaling. In AutoSynch, the 
programmer simply specifies the predicate P on which the thread is 
waiting using the construct waituntil(P) statement. When a thread 
executes the statement, it checks whether P is true. If it is true, the 
thread can continue; otherwise, the thread must wait for the system 
to signal it. The AutoSynch system has a condition manager that is 
responsible for determining which thread to signal by analyzing the 
predicates and the state of the shared object. 



Fig.[T]shows the difference between the Java and the AutoSynch 
implementation for the parameterized bounded-buffer problem, 
a variant bounded-buffer problem (also known as the producer- 
consumer problem) IISL 1 1 Oil . In this problem, producers put items 
into the shared buffer, while consumers take items out of the buffer. 
The put function has a parameter items; the take function has a 
parameter, num, indicating the number of items taken. There are 
two requirements for synchronization. First, every operation on 
a shared variable, such as buff, should be done under mutual ex- 
clusion. Second, we need conditional synchronization; a producer 
must wait when the buffer has no sufficient space, and a consumer 
must wait when the buffer has no sufficient items. The explicit- 
signal bounded-buffer is written in Java. A lock variable and two 
associated condition variables are used to maintain mutual exclu- 
sion and conditional synchronization. A thread needs to acquire the 
lock before entering member functions. In addition, programmers 
need to explicitly associate conditional predicates with condition 
variables and call signal (signalAH) or await statement manually. 
Note that, the unlock statement should be done in a finally block, 
try and catch blocks are also need for the InterruptedException that 
may be thrown by await. However, for simplicity, we avoid the ex- 
ception handling in Fig.Q] The automatic-signal bounded-buffer is 
written using AutoSynch framework. As in line 1 , we use AutoSynch 
modifier to indicate that the class is a monitor, all member functions 
of the class is mutual exclusion. For conditional synchronization, 
we use waituntil as in line 9. There are no signal or signalAH calls 
in the AutoSynch program. Clearly, the automatic-signal monitor is 
much simpler than the explicit- signal monitor. 
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Figure 2: The framework of AutoSynch 

To facilitate the automatic-signal mechanism in Java, we have 
implemented the framework of AutoSynch illustrated in Fig. [2] The 
framework is composed of a preprocessor and a Java condition 
manager library. The preprocessor translates AutoSynch code into 
traditional Java code. Our automatic-signal mechanism and devel- 
oped techniques were implemented in the Java condition manager 
library, which is responsible for monitoring the state of the monitor 
object and signaling an appropriate thread. 

In this paper, we argue that automatic signaling is generally as 
fast as explicit signaling (and even faster for some examples). In 
Section [3] we give reasons for the efficiency of automatic signal- 
ing. In short, the explicit signaling has to resort to signalAH in some 
examples; however, our automatic signaling never uses signalAH. 
Thus AutoSynch is considerably faster for synchronization prob- 
lems with signalAH. The design principle underlying AutoSynch is 
to reduce the number of context switches and predicate evaluations. 

Context switch: A context switch requires a certain amount of 
time to save and load registers and update various tables and 
lists. Reducing unnecessary context switches boosts the perfor- 
mance of the system. A signalAH call introduces unnecessary 
context switches; therefore, signalAH calls are never used in Au- 
toSynch. 

Predicate evaluation: In the automatic-signal mechanism, signal- 
ing a thread is the responsibility of the system. The number of 
predicate evaluations is crucial for efficiency in deciding which 



thread should be signaled. By analyzing the structure of the 
predicate, our system reduces the number of predicate evalu- 
ations. 

There are three important novel concepts in AutoSynch that 
enables efficient automatic signaling — globalization of predicates, 
relay invariance, and predicate tagging. 

The technique of globalization of a predicate P is used to reduce 
the number of context switches for its evaluation. In the current sys- 
tems, only the thread that is waiting for the predicate P can evaluate 
it. When the thread is signaled, it wakes up, acquires the lock to the 
monitor and then evaluates the predicate P. If the predicate P is 
false, it goes back to wait resulting in an additional context switch. 
In AutoSynch system, the thread that is in the monitor evaluates the 
condition for the waiting thread and wakes it only if the condition 
is true. Since the predicate P may use variables local to the thread 
waiting on it, AutoSynch system derives a globalization predicate 
P' of the predicate P, such that other threads can evaluate P' . The 
details of globalization are in Section |4~Tl 

The idea of relay invariance is used to avoid signalAH calls in 
AutoSynch. The relay invariance ensures that if there is any thread 
whose waiting condition is true, then there exists at least one thread 
whose waiting condition is true and is signaled by the system. With 
this invariance, the signalAH call is unnecessary in our automatic- 
signal mechanism. This mechanism reduces the number of context 
switches by avoiding signalAH calls. The details of this approach 
are in Section |4~2l 

The idea of predicate tagging is used to accelerate the process 
of deciding which thread to signal. All the waiting conditions are 
analyzed and tags are assigned to every predicate according to its 
semantics. To decide which thread should be signaled, we identify 
tags that are most likely to be true after examining the current state 
of the monitor. Then we only evaluate the predicates with those 
tags. The details of predicate tagging are in Section H31 

Our experimental results indicate that AutoSynch can signifi- 
cantly improve performance compared to other automatic-signal 
mechanisms Q]. In the automatic-signal mechanism is 10- 

50 times slower than the explicit-signal mechanism; however, Au- 
toSynch is only 2.6 times slower than the explicit- signal mechanism 
even in the worst case of our experiment results. Furthermore, Au- 
toSynch is 26.9 times faster than the explicit-signal mechanism in 
the parameterized bounded-buffer problem that relies on signalAH 
calls. Besides, the experimental results also show that AutoSynch is 
scalable; the performance of AutoSynch is stable even if the number 
of threads increases for many problems conducted in the paper. 

Although the experiment results show that AutoSynch is 2.6 
times slower than the explicit- signal mechanism in the worst case, 
it is still desirable to have automatic signaling. First, automatic sig- 
naling simplifies the task of concurrent programming. In explicit- 
signal monitor, it is the responsibility of programmers to explicitly 
invoke a signal call on some condition variable for conditional syn- 
chronization. Using the wrong notification, and signaling a wrong 
condition variable are frequent sources of bugs. The idea is analo- 
gous to automatic garbage collection. Although garbage collection 
leads to decreased performance because of the overhead in deciding 
which memory to free, programmers avoid manual memory deal- 
location. As a consequence, memory leaks and certain bugs, such 
as dangling pointers and double free bugs, are reduced. Similarly, 
automatic-signal mechanism consumes computing resources in de- 
ciding which thread to be signaled; programmers avoid explicitly 
invoking signal calls. As a result, some bugs, such as using wrong 
notification and signaling a wrong condition variable, are elimi- 
nated. Secondly, in explicit- signal monitor, the principle of separa- 
tion of concerns is violated. Any method that changes the state of 
the monitor must be aware of all the conditions, which other threads 
could be waiting for, in other methods of the monitor. The intricate 



Explicit-Signal 



1 class BoundedBuf f er { 



2 Object [] buff; 

3 int putPtr, takePtr, count; 

4 Lock mutex = new ReentrantLockO ; 

5 Condition insuf f icientSpace = mutex. newConditionO ; 

6 Condition insuf ficientltem = mutex. newConditionO ; 

7 public BoundedBuf fer( int n) { 
s buff = new Object [n] ; 

9 putPtr = takePtr = count = 0; 

10 } 

11 public void put(0bject[] items) { 

12 mutex. lock () ; 

13 while (items . length + count > buff. length) { 

14 insuf f icientSpace . await () ; 

15 } 

16 for (int i = 0; i < items . length; i++) { 
n buf f [putPtr++] = items [i] ; 

is putPtr '/,= buff .length; 

19 } 

20 count += items . length; 

21 insuf f i c i ent 1 1 em . s ignal All ( ) ; 

22 mutex. unlock () ; 

23 } 

24 public Object [] take (int num) { 

25 mutex. lock() ; 

26 while (count < num) { 

27 insuf ficientltem. await() ; 

28 } 

29 Object [] ret = new Object [num]; 

30 for (int i = 0; i < num; i++) { 

31 ret [i] = buff [takePtr++] ; 

32 takePtr %= buff .length; 

33 } 

34 count = num; 



35 insuf f icientSpace . signalAllO ; 

36 mutex. unlockO ; 

37 return ret ; 

38 } 

39 } 



Automatic-Signal 



1 AutoSynch class BoundedBuf fer { 

2 Object [] buff; 

3 int putPtr , takePtr , count ; 

4 public BoundedBuf fer (int n) { 

5 buff = new Object [n] ; 

6 putPtr = takePtr = count = 0; 

7 } 

8 public void put(0bject[] items) { 

9 waituntil (count + items. length <= buf f . length) ; 

10 for (int i = 0; i < items . length; i++) { 
n buf f [putPtr++] = items [i] ; 

12 putPtr "/,= buff .length; 

13 } 

14 count += items . length; 

15 } 

16 public Object [] take(int num) { 
n waituntil (count >= num); 

is Object [] ret = new Object [num]; 

19 for (int i = 0; i < num; i++) { 

20 ret [i] = buff [takePtr++] ; 

21 takePtr '/,= buff .length; 

22 } 

23 count = num; 

24 return ret ; 

25 } 
26} 



Figure 1 : The parameterized bounded-buffer example 



relation between threads for conditional synchronization breaks the 
modularity and encapsulation of programming. Finally, AutoSynch 
can provide rapid prototyping in developing programs and acceler- 
ating product time to market. Moreover, a correct automatic -signal 
implementation is helpful in debugging an explicit-signal imple- 
mentation. 

Although this paper focuses on Java, our techniques are also 
applicable to other programming languages and models, such as 
pthreadand C# lfH. 

This paper is organized as follows. Section [2] gives the back- 
ground of the monitor. Section[3]explains why signalAll is required 
for explicit-signal monitor but not automatic-signal monitor. The 
concepts of AutoSynch are presented in Section|4]and the practical 
implementation details are discussed in Section [5] The proposed 
methods are then evaluated with experiments in Section|6] Section 
|7]gives the concluding remarks. 

2. Background: monitor 

Monitor is an abstract object or module containing shared data to be 
used safely by multiple threads in concurrent programming. Mon- 
itor can be defined by two characteristics, mutual exclusion and 
conditional synchronization. Mutual exclusion guarantees that at 
most one thread can execute any member function of a monitor at 
any time. Threads acquire the lock of the monitor to acquire the 
privilege for accessing it. Conditional synchronization maintains 
the execution order between threads. Threads may wait for some 
condition to be met and release the monitor lock temporarily. Af- 



ter the condition has been met, threads then re-acquire the lock and 
continue to execute. According to Buhr and Harji [5], monitors can 
be divided into two categories according to the different implemen- 
tations of conditional synchronization. 

Explicit-signal monitor In this type of monitor, condition vari- 
ables, signal and await statements are used for synchronization. 
Programmers need to associate assertions with condition vari- 
ables manually. This mechanism involves two or more threads. 
A thread waits on some condition variable if its predicate is not 
true. When another thread detects that the state has changed and 
the predicate is true, it explicitly signals the appropriate condi- 
tion variable. 

Automatic-signal (implicit-signal) monitor This kind of moni- 
tor uses waituntil statements, such as line 9 in automatic-signal 
program in Fig. [T] instead of condition variables for synchro- 
nization. Programmers do not need to associate assertions with 
variables, but use waituntil statements directly. In monitor, a 
thread will wait as long as the condition of a waituntil statement 
is false, and execute the remaining tasks only after the condition 
becomes true. The responsibility of signaling a waiting thread 
is that of the system rather than of the programmers. 

3. signalAll requirement in explicit 

The signalAll call is essential in explicit-signal mechanism when 
programmers do not know which thread should be signaled. In 
Fig. Q] a producer must wait if there is no space to put num 



items, while a consumer has to wait when the buffer has insufficient 
items. Since producers and consumers can put and take different 
numbers of items every time, they may wait on different conditions 
to be met. Programmers do not know which producer or consumer 
should be signaled at runtime. Therefore, the signalAIl call is used 
instead of signal calls in line 21 and 35. Although programmers 
can avoid using signalAIl calls by writing complicated code that 
associates different conditions to different condition variables; the 
complicated code makes the maintenance of the program bad. 

The signalAIl call is expensive; it may decreases the perfor- 
mance because it introduces redundant context switches, requiring 
computing time to save and load registers and update various tables 
and lists. Furthermore, signalAIl calls cannot increase parallelism 
because threads are forbidden to access a monitor simultaneously. 
Although multiple threads are signaled at a time, only one thread 
is able to acquire the monitor. Other threads may need to go back 
to waiting state since another thread may change the status of the 
monitor. Suppose in Fig. [I] the buffer has 64 items after a producer 
finishes a put call. The producer calls insufficientltem.signalAllf ) in 
line 21 before completing the call. 10 waiting consumers are sig- 
naled; each of them is waiting to take 48 items. Suppose the con- 
sumer C re-acquires the lock first and takes 48 items. The remain- 
ing items, 16, are insufficient for the other threads; they make con- 
text switches, re-evaluate their predicates, and go back to waiting 
state. Theses context switches are redundant since the 9 threads do 
not make any progress but only go back to waiting state. Therefore, 
if we avoid using the signalAIl call and only signal a thread that is 
most likely to make progress, the unnecessary context switches can 
be reduced. 



4. AutoSynch concepts 
4.1 Predicate evaluation 

In AutoSynch, it is the responsibility of the system to signal appro- 
priate threads automatically. The predicate evaluation is crucial in 
deciding which thread should be signaled. We discuss how to pre- 
form predicate evaluations of waituntil statements. 

A predicate P(x) : X — > B is a Boolean condition, where 
X is the space spanned by the variables x — (x\, . . . , x n ). A 
variable of a monitor object is a shared variable if it is accessible 
by every thread that is accessing the monitor. The set of shared 
variables is denoted by S. The set of local variables, denoted by 
L, is accessible only by a thread calling a function in which the 
variables are declared. 

Predicates can be used to describe the properties of conditions. 
In our approach, every condition of waituntil statement is repre- 
sented by a predicate. We say a condition has been met if its repre- 
senting predicate is true; otherwise, the predicate is false. Further- 
more, we assume that every predicate, P — VjLiCi, is in disjunc- 
tive normal form (DNF), where Cj is defined as the conjunction 
of a set of atomic Boolean expressions. For example, a predicate 
(x = l)A(y = 6)V(z / 8) is DNF, where ci = (x = l)f\(y = 6) 
and C2 = (z ^ 8). Note that, every Boolean formula can be con- 
verted into DNF using De Morgan's laws and distributive law. 

Predicates can be divided into two categories based on the type 
of their variables JB]]. 

Definition 1 (Shared and complex predicate). Consider a predicate 
P(x) : X — >■ B. If X C S, P is a shared predicate. Otherwise, it 
is a complex predicate. 

The automatic-signal monitor has an efficient implementation 
fUjll by limiting the predicate of a waituntil to a shared predicate; 
however, we do not limit the predicate of a waituntil statement to 
a shared predicate. The reason is that this limitation will lead Au- 



toSynch to be less attractive and practical since conditions including 
local variables cannot be represented in AutoSynch. 

Evaluating a complex predicate in all threads is unattainable 
because the accessibility of the local variables in the predicate 
is limited to the thread declaring them. To evaluate a complex 
predicate in all threads, we treat local variables as constant values 
at runtime and define globalization as follows. 

Definition 2 (Globalization). Given a complex predicate P(x, a) : 
Ixi->B, where XCS and A C L. The globalization of P at 
runtime t is the new shared predicate 

G t (x) = P(x, a t ), 

where at is the values of a at runtime t. 

The globalization can be applied to any complex predicate; a 
shared predicate can be derived from the globalization. For exam- 
ple, in Fig. Q] the consumer C wants to take 48 items at some 
instant of time. Applying the globalization to the complex predi- 
cate [count > nurn) in line 19, we derive the shared predicate 
(count > 48). 

The following proposition shows that the complex predicate 
evaluation of waituntil statement in all threads can be achieved 
through the globalization. 

Proposition 1. Consider a complex predicate P(x, a) in a wait- 
until statement. P(x, a) and its globalization P(x, at) are seman- 
tically equivalent during the waituntil period, where t is the time 
instant immediately before invoking the waituntil statement. 

Proof. Only the thread invoking the waituntil statement can access 
the local variables of the predicate; all other threads are unable to 
change the values of those local variables. Therefore, the value of a 
cannot be changed during the waituntil period. Since a t is the value 
of a immediately before invoking the waituntil statement, P(x, a) 
and P(x, at) are semantic equivalent during the waituntil period. 

■ 

Proposition [TJenables the complex predicate evaluation of wait- 
until statement in all threads. Given a complex predicate in a wait- 
until statement, in the sequel we substitute all the local variables 
with their values immediately before invoking the statement. The 
predicate can now be evaluated in all other threads during the wait- 
until period. 

4.2 Relay invariance 

As mentioned in Section [3] signalAIl calls are sometimes unavoid- 
able in the explicit-signal mechanism. In AutoSynch, signalAIl calls 
are avoided by providing the relay invariance. 

Definition 3 (Active and inactive thread). Consider a thread that 
tries to access a monitor. If it is not waiting in a waituntil statement 
or has been signaled, then it is an active thread for the monitor. 
Otherwise, it is an inactive thread. 

Definition 4 (Relay invariance). If there is a thread waiting for a 
predicate that is true, then there is at least one active thread; i.e., 
suppose Wt is the set of waiting threads whose conditions have 
become true, At is the set of active threads, then 

holds at all time. 

AutoSynch uses the following mechanism for signaling. 

Relay signaling rule: When a thread exits a monitor or goes 
into waiting state, it checks whether there is some thread waiting 
on a condition that has been true. If at least one such waiting thread 
exists, it signals that thread. 



Proposition 2. The relay signaling rule guarantees relay invari- 
ance. 

Proof. Suppose a thread T is waiting on the predicate P that is true. 
Since T is waiting on P, P must be false before T went to waiting 
state. There must exist another active thread R after T such that R 
changed the state of the monitor and made P true. According to 
the rule, R must signal T or another thread waiting for a condition 
that is true before leaving the monitor or going into waiting state. 
The thread signaled by R then becomes active. Therefore, the relay 
invariance holds. 



The concept behind relay invariance is that, the privilege to enter 
the monitor is transmitted from one thread to another thread whose 
condition has become true. For example, in Fig.[TJ the consumer C 
tries to take 32 items; however, only 24 items are in the buffer at 
this moment. Then, C waits for the predicate P : [count > 32) to 
be true. A producer, D, becomes active after C; D puts 16 items 
into the buffer and then leaves the monitor. Before leaving, D finds 
that P is true and then signals C; therefore, C becomes active 
again and takes 32 items of the buffer. Proposition [2] shows that 
the relay invariance holds in our automatic-signaling mechanism. 
Thus, signalAll calls are avoidable in AutoSynch. The problem is 
now reduced to finding a thread waiting for a condition that is true. 

4.3 Predicate tag 

In order to efficiently find an appropriate thread waiting for a 
predicate that is true, we analyze every waiting condition and assign 
different tags to every predicate according to its semantics. These 
tags help us prune predicates that are not true by examining the 
state of the monitor. The idea behind the predicate tag is that, 
local variables cannot be changed during the waituntil period; thus 
the values of local variables are used as keys when we evaluate 
predicates. First, we define two types of predicates according to 
their semantics. 

Definition 5 (Local and shared expression). Consider an expres- 
sion E(x) : X — > D, where D represents one of the primitive data 
types in Java. IfXQL, then E is a local expression. Otherwise, if 
X C S, E is a shared expression. 

We use SE to denote a shared expression, and LE to denote a 
local expression. 

Definition 6 (Equivalence predicate). A predicate P : (SE = 
LE) is an equivalence predicate. 

Definition 7 (Threshold predicate). A predicate P : (SE op LE) 
is a threshold predicate, where op £ {<, <, >, >}. 

Note that, many predicates that are not equivalence or threshold 
predicates can be transformed into them. Consider the predicate 
(x — a = y + b), where x,y £ S and a, b £ L. This predicate is 
equivalent to (x — y = a + b) which is an equivalence predicate. 
Thus, these two types of predicates can represent a wide range of 
conditions in synchronization problems. 

Given an Equivalence or a Threshold predicate, we can apply 
the globalization operation to derive a constant value on the right 
hand side of the predicate. In AutoSynch, there are three types of 
tags, Equivalence, Threshold, and None. Every Equivalence 
or Threshold tag represents an equivalence predicate or a thresh- 
old predicate, respectively. If the predicate is neither equivalence 
nor threshold, it acquires the None tag. For example, consider the 
Threshold predicate x + b > 2y + a where a and b are lo- 
cal variables with values 11 and 2. We first use the globalization 
to convert it to (x — 2y > 9), which is represented by the tag 



(Threshold, x — 2y, 9, >). The formal definition of tag is as 
follows. 

Definition 8. A tag is a four-tuple (M, expr, key, op), where 

• M £ {Equivalence, Threshold, None}; 

• expr is a shared expression if M 6 {Equivalence, Threshold}; 
otherwise, expr =J_; 

• key is the value of a local expression after applying glob- 
alization if M £ {Equivalence, Threshold}; otherwise, 
key =_L; 

• op £ {<, <, >, >} if M = Threshold; otherwise, op =_L 

We say that a tag is true (false) if the predicate representing the 
tag is true (false). 

4.3.1 Predicate tagging 

A tag is assigned to every conjunction. The tags of conjunctions 
of a predicate constitute the set of tags of the predicate. Tags are 
given to every predicate by the algorithm shown in Fig. [3] When 
assigning a tag to a conjunction, the equivalence tag has the highest 
priority. The reason is that the set of values to make an equivalence 
predicate true is smaller than the set of values to make a threshold 
predicate true. The equivalence predicate is true only when its 
shared expression equals a specific value. For example, consider an 
equivalence predicate x = 8 and a threshold predicate x > 3. The 
predicate x — 8 is true only when the value of x is 8, whereas 
x > 3 is true for a much larger set of values. Therefore, the 
Equivalence tags can help us prune predicates that are false more 
efficiently than other kinds of tags. If a conjunction does not include 
any equivalence predicate, then we check whether it includes any 
threshold predicate. If yes, then a Threshold tag is assigned to the 
conjunction; otherwise, the conjunction has a None tag. 



tags = empty 

foreach conjunction c 

if c contains an equivalence predicate se=le 

tag t = (Equivalence, se, globalization(le) , null) 
else if c contains a threshold predicate se op le 

tag t = (Threshold, se, globalization(le) , op) 
else 

tag t = (None, null, null, null) 
add t to tags 
return tags 



Figure 3: Predicate Tagging 

Creating all tags for a conjunction is unnecessary. If a conjunc- 
tion includes multiple equivalence predicates or threshold predi- 
cates, only one arbitrary Equivalence tag or Threshold tag is 
assigned to the conjunction. If there are a large number of tags, 
then the performance may decrease because of the cost of main- 
taining tags. As a result, we assign only one tag to every con- 
junction. Assigning multiple tags to a conjunction cannot accel- 
erate the searching process. For example, consider a conjunction 
(x — 8) A (y — 9). If only a tag (Equivalence, x, 8, null) is 
assigned to the conjunction, we check the predicate when the tag 
is true. Adding another tag (Equivalence, y, 9, null) cannot ac- 
celerate the searching process since we need to check both the tags. 

Note that multiple predicates with a shared conjunct may share 
a tag. For example, the predicates (a; = 5) A (z < 4) and 
(x — 5) A (y > 4) would have a shared equivalence tag of (x = 5). 

4.3.2 Tag signaling 

Signaling mechanism is based on tags in AutoSynch. Since the 
equivalence tag is more efficient in pruning the search space than 



the threshold tag, the predicates with equivalence are checked prior 
to the predicates with other tags. If no predicate that is true can 
be found after checking Equivalence tags and Threshold tags, 
our algorithm does the exhaustive search for the predicates with a 
None tag. 

Equivalence tag signaling: Observes that, an equivalence predi- 
cate becomes true only when its shared expression equals the spe- 
cific value of its local expression after applying globalization. For 
distinct equivalence tags related to the same shared expression, at 
most one tag can be true at a time because the value of its local 
expression is deterministic and unique at any time. By observing 
the value of its local expression, the appropriate tag can be iden- 
tified. For example, suppose there are three Equivalence tags for 
predicates x = 3, x = 6, and x = 8. We examine x and find that 
its value is 8. Then we know that only the third predicate x — 8 is 
true. Based on this observation, for each unique shared expression 
of an equivalence tag, we create a hash table, where the value of 
the local expression is used as the key. By using this hash table and 
evaluating the shared expression at runtime, we can find a tag that 
is true in O(l) time if there is any. Then we check the predicates 
with the tag. 

Threshold tag signaling: Consider the following example. Sup- 
pose there are two predicates, x > 5 and x > 3. We know that if 
x > 3 is false, then x > 5 cannot be true. Hence, we only need 
to check the predicate with the smallest local expression value for 
> and > operations. Furthermore, consider the predicates with the 
same local expression value but different operations, x > 3 and 
x > 3. The predicate x > 3 cannot be true when x > 3 is false; 
i.e., we only need to check the predicate x > 3. We use a min- 
heap data structure for storing the threshold tags related to a same 
shared expression with op £ {>,>}. If two predicates have the 
same local expression value but different operations, then the pred- 
icate with > is considered to have a smaller value than the predi- 
cate with > in the min-heap. Similarly, the max-heap can be used 
for threshold tags with op £ {<, <}. 

The signaling mechanism for Threshold tag is shown if Fig. [4] 
In general, the tag in the root of a heap is checked. If the tag is 
false, all the descendant nodes are also false. Otherwise, all predi- 
cates with the tag need to be checked for finding a true predicate. 
To maintain the correctness, if no predicate is true, the tag is re- 
moved from the heap temporarily. Then the tag in the position of 
the new heap root is checked again until a true predicate is found 
or a false tag is found. Those tags removed temporarily are rein- 
serted to the heap. The reason to remove the tags is that the de- 
scendants of the tags may also be true since the tags are true. So 
we also need to check the descendant tags. For example, consider 
the predicates Pi : (x > 5) A (y / 1) and P 2 : (x > 7). 
Pi has the tag Q\ : (Threshold, x, 5, >) and P2 has the tag 
Q2 '■ (Threshold, x, 7, >). Qi is the root and Q2 is its descen- 
dant. Suppose at some time instant x — 3, then Qi is false; thus, 
there is no need to check Q^- Now, suppose x — 9 and y = 1, then 
Qi is true. We check all predicates that have tag Q\. Since Pi is 
false, no predicate having tag Q\ is true. Then Qi is removed form 
the heap temporarily. We find the new root Q2 is true and P2 that 
has tag Q2 is also true. We signal a thread waiting for P2 and then 
add Qi back to the heap. 

Suppose there are n Threshold tags for a shared expression 
with different keys. Suppose that these tags are assigned to m predi- 
cates. The time complexity for maintaining the heap is 0(n log(n)) 
However, the performance is generally much better because we 
only need to check the predicates of the tags in the root position 
in the most cases. The time complexity for finding the root is 0(1). 
In the worst case, we need to check all predicates; thus, the time 
complexity is 0(nlog(n) + m). However, this situation is rare. 



// peekO : retrieve but does not remvoe the root 
// pollO : retrieve and remove the root 
list backup = empty; 
tag t = heap.peekO; 
while t is true 

foreach predicate p with t 
if p is true 

signal a thread waiting on p 
foreach b in backup 

heap . add(b) 
return 

backup . insert (heap .poll ) 
t = heap.peekO 
foreach b in backup 
heap. add (b) 



Figure 4: Threshold tag signaling 



Furthermore, this algorithm is optimized for evaluating threshold 
predicates by sacrificing performance in tag management. 

5. AutoSynch implementation 

The AutoSynch implementation involves two parts, the preproces- 
sor and the Java library of condition manager. The preprocessor, 
built using JavaCC I17I1 . translates AutoSynch code to Java code. 
Our signal-mechanism is implemented in the condition manager li- 
brary that creates condition variables, and maintains the association 
between predicates and condition variables. Furthermore, predicate 
tags are also maintained by the condition managers. It is the respon- 
sibility of the condition manager to decide which thread should be 
signaled. 

5.1 Preprocessor 

The AutoSynch class provides both mutual exclusion and condi- 
tional synchronization. To maintain these two properties, our pre- 
processor adds some additional variables for any AutoSynch class. 
Fig.[5]summarizes the definitions of additional variables in the con- 
structor of an AutoSynch class. The lock variable, mutex, is de- 
clared for mutual exclusion, which is acquired at the beginning of 
every member function and released before the return statement. In 
addition, a condition manager, condM gr, is declared for synchro- 
nization. The details of the condition manager are discussed in the 
next section. 



Lock mutex 

ConditionManager condMgr 

foreach shared predicate P 

tags = AnalyzePredicate(toDNF(P)) 
condMgr . registerSharedPredicate (P , tags) 

foreach shared expression E 

condMgr . registerSharedExpression(E) 



Figure 5: The additional variables for an AutoSynch class 

All predicates are transformed to DNF in the preprocessing 
process by De Morgan's laws and distributive law. Then we ana- 
lyze predicates to derive their tags. The condition manager regis- 
ters the predicates and shared expressions for predicate evaluation. 
The shared predicates and shared expressions are identified in the 
preprocessing stage and added in the constructor of the class as 
in Fig. [5] We add shared predicates and shared expressions (but 



not complex predicates) in the construct because their semantics 
is static and never changes. A complex predicate is registered dy- 
namically because its globalization may change according to the 
value of its local variables at runtime. In Java, the shared predi- 
cates and shared expressions are created as inner classes that can 
access the shared variables appearing in them with isTrueQ or 
getValueQ functions for the condition manager to evaluate. The 
function isTrue{) returns the evaluation of the shared predicate 
and the function getValueQ returns the value of the shared ex- 
pression. 

For every member function of an AutoSynch class, the mu- 
tex.lockf) and mutex.unlock() are inserted at the beginning of the 
function and immediately before the return statement, respectively. 

In the waituntil statement, the predicate is checked initially. If 
it is true, then the thread can continue. Otherwise, the type of pred- 
icate is checked. If the predicate is complex, then we apply glob- 
alization to it for deriving a new shared predicate. Then we query 
the condition manager to determine whether the derived predicate 
has been added earlier. If not, we add the predicate with its tags to 
the condition manager. Then, the corresponding condition variable 
can be obtained by calling getCondition( ) function of the condition 
manager. The relaySignal( ) function maintains relay invariance and 
signals an appropriate thread. Then, the thread goes into the wait- 
ing state until the predicate becomes true. After exiting the waiting 
state, if the predicate is complex and the corresponding condition 
has no other waiting thread, and then it is deactivated by the condi- 
tion manager. 



if P is false 

if P is a complex predicate 
P := Globalization(toDNF(P)) 
if P is not in condMgr 

tags = AnalyzePredicate (P) 
condMgr . registerComplexPredicate (P , tags) 
C = condMgr . getCondition(P) 
do 

condMgr . r elaySignal ( ) 
wait C 
while P is false 

if P is complex predicate and C has no waiting thread 
condMgr . inact ive (P) 



Figure 6: Preprocessing for a waituntil(P) statement 



5.2 AutoSynch Java library: condition manager 

The condition manager maintains the predicates and condition vari- 
ables, and provides the signaling mechanism. To avoid creating re- 
dundant predicates and condition variables, predicates that have the 
same meaning should be mapped to the same condition variable. 
Two predicates are syntax equivalent if they are identical after ap- 
plying globalization. A predicate table, which is implemented by a 
hash table, records predicates and their associated condition vari- 
ables. 

When a predicate is added to the condition manager, its tags are 
stored in an appropriate data structure depending upon the type of 
its tag. Fig.[7]shows an example. The symbol • indicates a condition 
variable. The gray blank indicates that the predicate is inactive, that 
is, no thread waits on it. A hash table is used for storing equivalence 
tags with the shared expression x. In addition, a min-heap and a 
max-heap are used for storing threshold tags. 

For finding a predicate that is true in Fig. [7] the value of the 
shared expression x is evaluated. We first check the hash table (with 
O(l) time complexity) using the value of the shared expression as 



the key. If we find a tag in the hash table, then we evaluate pred- 
icates that have the tag. If there exists a predicate is true, then we 
signal its corresponding condition variable. Otherwise, we check 
the max-heap and the min-heap. If we find that both tags in the 
roots are false, we search for the predicates with the None tag 
exhaustively. If one of these predicates is true we signal the cor- 
responding condition variable. As can be expected, the equivalence 
and threshold tags are helpful for searching predicates that are true. 

A predicate must be removed from the tag once no thread waits 
on it to avoid unnecessary predicate evaluation. A threshold tag also 
needs to be removed once it has no predicate. 

Predicates may be reused. Instead of removing those predicate 
with no waiting thread, we move those predicate to an inactive 
list. If they are used later, then we remove them from the inactive 
list. Otherwise, when the length of the inactive list exceeds some 
predefined threshold, we remove the oldest predicates from the list. 
Note that, the shared predicates are never removed since they are 
static and are added only at the constructor. 

6. Evaluation 

We discuss the experiments conducted for evaluating the perfor- 
mance of AutoSynch in this section. We compare the performances 
of different signaling mechanisms in three sets of classical condi- 
tional synchronization problems. The first set of problems relies on 
only shared predicates for synchronization. Next, we explore the 
performance for problems using complex predicates. Finally, we 
evaluate the problems on which signalAll calls are required in the 
explicit- signal mechanism. 

6.1 Experimental environment 

All of the experiments were conducted on a machine with 16 
Intel(R) Xeon(R) X5560 Quad Core CPUs (2.80 GHz) and 64 GBs 
memory running Linux 2.6.18. 

Our experiments are saturation tests [5], in which only mon- 
itor accessing function is performed. That is, no extra work is in 
the monitor or out of the monitor. For every experimental setting, 
we perform 25 times, and remove the best and the worst results. 
Then we compare the average runtime for different signaling mech- 
anisms. 

6.2 Signaling mechanisms 

Four implementations using different signaling mechanisms have 
been compared. 

Explicit-signal Using the original Java explicit-signal mechanism. 

Baseline Using the automatic-signal mechanism relying on only 
one condition variable. It calls signalAll to wake every waiting 
thread. Then each waken thread re-evaluates its own predicate 
after re-acquiring the monitor. 

AutoSynch-T Using the approach described in this paper but ex- 
cluding predicate tagging. 

AutoSynch Using the approach described in this paper. 

6.3 Test problems 

Seven conditional synchronization problems are implemented for 
evaluating our approach. 

6.3.1 Shared predicate synchronization problems 

Bounded-buffer 1 8, 10] This is the traditional bounded-buffer 
problem. Every producer waits if the buffer is full, while ev- 
ery consumer waits if the buffer is empty. 

Sleeping barber 1 8, 10] The problem is analogous to a barbershop 
with one barber. A barber has number of waiting chairs. Every 
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Figure 7: A example of the condition manager in AutoSynch 



time when he finishes cut, he checks whether some customers 
are waiting. If there are, he cuts hair for one customer. If no 
customers waiting, the barber goes to sleep. Every customer 
arrives and checks what the barber is doing. If the barber is 
sleeping, then he wakes the barber and has haircut. Otherwise, 
the customer checks whether there is any free waiting chair. If 
there is, the customer waits; otherwise, the customer leaves. 

H 2 problem d This is the simulation of water generation. Ev- 
ery H atom waits if there is no O atom or another H atom. 
Every O atom waits if the number of H atom is less than 2. 

6.3.2 Complex predicate synchronization problems 

Round-Robin Access Pattern Every test thread accesses the mon- 
itor in round-robin order. 

Readers/Writers | 7] We use the approach given in (31, where a 
ticket is used to maintain the accessing order of readers and 
writers. Every reader and writer gets a ticket number indicating 
its arrival order. Readers and writers wait on the monitor for 
their turn. 

Dining philosophers 1 10] A number of philosophers are siting 
around at a table with a dish in front of them and a chopstick 
in between each philosopher. A philosopher needs to pick two 
chopsticks at the same time for eating and he does not put down 
a chopstick until he finishes eating. A philosopher that wants to 
eat must wait if one of his shared chopsticks is hold by another 
philosopher. 

6.3.3 Synchronization problems required signalAll in explicit 

Parameterized bounded-buffer 1 8, 10] The parameterized bounded- 
buffer problem shown in Fig. [7] 

6.4 Experimental results 

Fig. [D to [10] plot the results for the bounded-buffer, the H2O, and 
the sleeping barber problem. The y-axis shows the runtime in sec- 
onds. The x-axis represents the number of simulating threads. Note 
that, in the H2O problem, only one thread simulating an O atom. 
The x-axis represents the number of thread simulating H atoms. 
As expected, the baseline is much slower than other three signal- 
ing mechanisms, which have similar performance in the bounded- 
buffer problem and the H2O problem. This phenomenon can be ex- 
plained as follows. There is only a constant number of shared predi- 
cates in waituntil statements for automatic-signal mechanisms. For 
example, in the bounded-buffer problem, there are two waituntil 
statements with global predicates, count > (not empty condition) 
and count < buff.length (not full condition). Therefore, the com- 



plexity for signaling a thread in AutoSynch and AutoSynch-T is also 
constant. Hence, both AutoSynch and AutoSynch-T are as efficient 
as the explicit-signal mechanism. An interesting point is that the 
performance of the baseline is as efficient as others in the sleeping 
barber problem. The reason is that the signalAll calls of the base- 
line do not increase the number of context switches. Whenever a 
signaled customer re-acquires the monitor, he can have a haircut 
since the previous customer has had haircut. These experiments il- 
lustrate that the automatic-signal mechanisms are as efficient as the 
explicit-signal mechanisms for synchronization problems relying 
on only shared predicates. 
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Figure 12: The results of readers/writers problem 



Fig. [TT] to [T3] presents the experimental results for the round- 
robin access pattern, the readers/writers problem, and the dining 
philosophers problem. The result of the baseline is not plotted 
in these figures since its performance is extremely inefficient in 
comparison to other mechanisms. In this set of experiments, the 
explicit-signal mechanism has an advantage since it can explicitly 
signal the next thread to enter the monitor. For example, in the 
round-robin access patter, an array of condition variables is used 
for associating the id of each thread and its condition variable. Each 
thread waits on its condition variable until its turn. When a thread 
leaves the monitor, it signals the condition variable of the next 
thread. As can be seen, the performance of explicit-signal mecha- 
nism is steady as the number of thread increases in the round-robin 
access pattern and the reader/writers problem. In AutoSynch-T, its 
runtime increases significantly as the number of thread increase. 
For AutoSynch, the performance is slower than the explicit-signal 
mechanism between 1 .2 to 2.6 times for the round-robin access pat- 
tern. However, the performance of AutoSynch does not decrease as 
the number of threads increases. Note that, in the readers/writers 
problem, the AutoSynch-T is more efficient than AutoSynch when 
the number of threads is small. The reason is that AutoSynch sac- 
rifices performance for maintaining predicate tags. The benefit of 
predicate tagging increases as the number of threads increases. An- 
other interesting point is that the performance of the explicit signal 
mechanism does not outperform implicit signal mechanisms much 
in the dining philosophers problem. The reason is that a philoso- 
pher only competes with two other philosophers sitting near him 
even when the number of philosophers increases. 



25 



-a 
a 
o 



20 



15 



10 



explicit — i 
AutoSynch-T — x- 
AutoSynch 



16 32 
# threads 



64 128 256 



Figure 11: The results of round-robin access pattern 

Table [TJ presents the CPU usage (profiled by YourKit i2(ill 1 for 
the round-robin access pattern with 128 threads. The relay Signal 
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Figure 13: The results of dining philosophers problem 



is the process of deciding which thread should be signaled in 
both AutoSynch and AutoSynch-T. Tag Mger is the computation 
for maintaining predicate tags in AutoSynch. As can be seen, the 
predicate tagging significantly improves the process for finding 
a predicate that is true. The CPU time of relaySingal process is 
reduced 95% with a slightly cost in tag management. 

In Fig.[l4] we compare the results of the parameterized bounded- 
buffer in which signalAll calls are required in the explicit-signal 
mechanism. In this experiment, there is one producer, which ran- 
domly puts 1 to 128 items every time. The y-axis indicates the num- 
ber of consumers. Every consumer randomly takes 1 to 128 items 
every time. As can be seen, the performance of the explicit-signal 
mechanism decreases as the number of consumers increases. Au- 
toSynch outperforms the explicit-signal mechanism by 26.9 times 
when the number of thread is 256. This can be explained by Fig.[T5l 
that depicts the number of contexts switches. The number of con- 
text switches increases in the explicit-signal mechanism in which 
the number of context switches is around 2.7 millions when thread 
is 256. However, the numbers of context switches are stable in Au- 
toSynch even the number of threads increase. It has around 5440 
context switches when the number of thread is 256. This exper- 
iment demonstrates that the number of context switches can be 
dramatically reduced and the performance can be increased in Au- 
toSynch for the problems required signalAll calls in the explicit- 
signal mechanism. 

7. Conclusions 

In this paper, we have proposed AutoSynch framework that supports 
automatic-signal mechanism with AutoSynch class and waituntil 
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statement. AutoSynch uses the globalization operation to enable 
the complex predicate evaluation in every thread. Next, it provides 
relay invariance that some thread waiting for a condition has met 
is always signaled to avoid signalAU calls. AutoSynch also uses 
predicate tag to accelerate the process in deciding which thread 
should be signaled. 

To evaluate the effectiveness of AutoSynch, we built a proto- 
type implementation using JavaCC fl7h . Java Compiler Compiler, 
and applied it to seven conditional synchronization problems. The 
experimental results indicate that AutoSynch implementations of 
these problems perform significantly better than other automatic- 
signal monitors. Even though AutoSynch is around 2.6 times slower 
than the explicit in the worst case of our experiments, AutoSynch is 
around 26.9 times faster than the explicit-signal in the parameter- 
ized bounded-buffer problem that relies on signalAU calls. 

In the future, we plan to optimize our framework through us- 
ing the architecture information. For example, we can get the num- 
ber of cores of a machine, and then limit the number of executing 



threads to avoid unnecessary contention. Our current implementa- 
tion of AutoSynch is built upon constructs provided by Java. Thus, 
there is possibility of further performance improvement if the ap- 
proach was to be implemented within the JVM. 
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